Skip to content

Conversation

bentsherman
Copy link
Member

@bentsherman bentsherman commented Aug 27, 2025

This PR introduces a new syntax for process which uses typed inputs and outputs. The existing syntax is still supported.

This PR refactors several large classes -- namely ProcessConfig and TaskProcessor -- to better separate concerns and enable a v1 / v2 model for process inputs/outputs. When moving existing code to new files, I try to change it as little as possible to not break anything.

ProcessConfig refactor

The following new classes were spun out of ProcessConfig:

  • ProcessConfigV1 / ProcessConfigV2 extend ProcessConfig with the declared inputs / outputs based on legacy (v1) or typed (v2) semantics

  • ProcessDslV1 / ProcessDslV2 are builder DSLs for legacy / typed process definitions

  • ProcessConfigBuilder is an adapter for applying process configuration to a process definition

  • ProcessBuilder is the base builder class used by the above builders

TaskProcessor refactor

The following new classes were spun out of TaskProcessor:

  • TaskInputResolver implements the input file resolution from makeTaskContextStage2()

  • TaskOutputResolver implements the task output resolution logic for typed processes

  • TaskEnvCollector implements the output env/eval resolution from collectOutEnvMap()

  • TaskFileCollector implements the output file resolution from collectOutFiles()

Typed inputs / outputs

The following new classes implement the new behavior for typed inputs / outputs:

  • ProcessInputs and ProcessOutputs replace InputsList and OutputsList from the v1 model

  • ProcessInput and ProcessOutput replace all InParam and OutParam classes from the v1 model

  • ProcessFileInput and ProcessFileOutput replace FileInParam and FileOutParam in the v1 model

Backwards compatibility

The runtime supports both legacy (v1) and typed (v2) processes by creating the ProcessDef with either a ProcessConfigV1 or ProcessConfigV2.

ProcessDef, TaskProcessor, and TaskRun check this type to determine whether to use v1 or v2 semantics. An instanceof check is performed at these decision points:

if( config instanceof ProcessConfigV1 )
    // use legacy inputs/outputs
if( config instanceof ProcessConfigV2 )
    // use typed inputs/outputs

Based on initial work in #4553

TODO:

  • update docs
  • update tests
  • add e2e tests

Copy link

netlify bot commented Aug 27, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 742de80
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/68f11f0ed962ad0008befafe

@bentsherman bentsherman changed the title Typed processe Typed processes Aug 27, 2025
@bentsherman bentsherman marked this pull request as ready for review September 1, 2025 17:53
@bentsherman bentsherman requested review from a team as code owners September 1, 2025 17:53
@bentsherman bentsherman added this to the 25.10 milestone Sep 1, 2025
Copy link
Collaborator

@christopher-hakkaart christopher-hakkaart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good. I like the tutorial in particular. I think it's clear and includes the right amount of detail. Also, the order makes sense and it's a good length.

I will take a second pass and nit pick the language. In the meantime, I've added two high level comments. They are very minor.

Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a too big change compared to current syntax. I do not support this approach

@bentsherman bentsherman force-pushed the typed-processes branch 3 times, most recently from 25a80b1 to 0e7be56 Compare September 5, 2025 20:44
@bentsherman
Copy link
Member Author

Updated to use "phase 1" syntax, i.e. support for multiple input channels and tuple inputs

Copy link
Collaborator

@christopher-hakkaart christopher-hakkaart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentsherman - I went through the tutorial in detail focusing on the language. I split everything into separate comments to hopefully make it easier to accept/reject.

I found some of the code blocks confusing as when I went into the example repo the code blocks didn't match what was in master branch. I'm fine with using the rnaseq-nf example and a little bit of difference is okay, but if anyone does what I tried to do it's hard to follow. Can we better align this? Alternatively, can we peel off this example from rnaseq-nf and start building an repo full of examples specifically for the docs? If might give a little more latitude for v1, v2, v3 of tutorials like this and allow better synergy between what is written and what's in the repo. Happy to hear your thoughts

Copy link
Collaborator

@christopher-hakkaart christopher-hakkaart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good!

I've made suggestions to thin words in some places.

I've also made suggestions to change some of the language in definition lists (what I messaged about earlier). This might stick out too much while the rest of the page is still what a developer does vs what the method does, i.e., "Return a" vs "Returns".

See what you think in the context of the page. If it sticks out too much reject and I'll make a PR later to standardize the page.

@bentsherman
Copy link
Member Author

Thanks Chris. I'm happy to go ahead with the recommend language for the typed process reference and we can update the rest of that page later on.

Copy link
Collaborator

@christopher-hakkaart christopher-hakkaart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ben, readings the docs again (particularly the guides and process-typed page), I have a few more suggestions:

  • Standardize language around the Nextflow VS Code extension. Maybe it can be a state it's Nextflow VS Code extension once, and then referred to as VS Code extension afterwards. I think it could be hammered as the Nextflow VS Code extension for clarity, but not a sticking point either way.
  • Framing in the migration guide. I think it could be more explanation, then steps for migration. I think a repeated structure and language here helps explain then instruct, make it clearer overall about what the changes are and how to implement them.
  • Reduce admonitions at the top of the page. It's a little jarring looking at the preview and seeing 3 admonitions before one sentence and a code block. I tried to make this text to make it a smoother transition into the page.

None of these are sticking points, so I've approved from a docs perspective. Once these are merged. There is some language on the migration notes page I would like to update, but I won't make a PR until after this is merged to keep down conflicts (although there probably won't be many).

@bentsherman
Copy link
Member Author

@claude please review

Copy link

claude bot commented Oct 13, 2025

Claude finished @bentsherman's task —— View job


Comprehensive Code Review

This is an excellent PR that introduces typed processes to Nextflow with a well-thought-out architecture. Here's my detailed analysis:

Architecture & Design ✅

ProcessConfig Refactoring: The refactoring of ProcessConfig into V1/V2 variants is exemplary:

  • Clean separation of concerns with minimal code duplication
  • ProcessConfigV1 handles legacy InputsList/OutputsList
  • ProcessConfigV2 handles new typed ProcessInputs/ProcessOutputs
  • Base class remains focused on core configuration logic

TaskProcessor Changes: Good extraction of responsibilities:

  • TaskInputResolver - handles file input resolution logic
  • TaskOutputResolver - handles typed output resolution
  • TaskEnvCollector - environment variable collection
  • TaskFileCollector - output file collection with proper input exclusion

Typed Input/Output Model: Well-designed type system:

  • ProcessInputs/ProcessOutputs provide clean interfaces
  • Support for nullable types with Path? syntax
  • Proper separation of parameters, environment, files, and stdin/topics
  • Good use of composition over inheritance

Backwards Compatibility ✅

The backwards compatibility approach is solid:

  • Runtime checks with instanceof ProcessConfigV1/V2 at decision points (TaskProcessor.groovy:1510-1514)
  • Legacy processes continue to work unchanged
  • V1/V2 models coexist cleanly without interference

Documentation & Tutorials ✅

Excellent documentation with:

  • Comprehensive tutorial in docs/tutorials/static-types.md with practical rnaseq-nf example
  • Complete reference documentation in docs/process-typed.md
  • Clear migration guide with specific examples
  • Good coverage of edge cases (nullable types, staging directives)

Test Coverage ✅

Strong test coverage including:

  • New unit tests for all extracted classes (TaskFileCollectorTest, TaskInputsResolverTest, etc.)
  • Integration tests with typed examples (collect-tuple-typed.nf, env-typed.nf, etc.)
  • Tests cover core functionality and edge cases

Implementation Quality ✅

Code Quality:

  • Clean separation of V1/V2 logic paths
  • Proper error handling and validation
  • Good use of Groovy idioms and existing patterns
  • Comprehensive type checking in new classes

Parser/AST Changes:

  • Grammar extensions in ScriptParser.g4 look correct for new syntax
  • AST nodes properly separated into V1/V2 variants
  • Visitor pattern implementation maintains consistency

Potential Areas for Improvement

  1. Error Messages: Consider enhancing error messages for type mismatches to be more user-friendly, especially for newcomers to typed processes.

  2. Performance: While the instanceof checks are minimal overhead, consider if there are hot paths where this could be optimized with a strategy pattern in the future.

  3. Documentation: Consider adding more examples of complex nested types and how they interact with existing Nextflow operators.

Minor Issues

  • Line docs/tutorials/static-types.md:45 has a typo: "nextflow_scham.json" should be "nextflow_schema.json"
  • Consider adding validation for unsupported type combinations early in the parsing phase

Overall Assessment

This is a well-executed feature addition that:

  • Maintains full backwards compatibility
  • Introduces powerful new capabilities
  • Has excellent test coverage and documentation
  • Uses sound software engineering principles
  • Follows established Nextflow patterns and conventions

The PR successfully delivers on its goals of enabling static type checking while preserving the existing user experience. The architecture is extensible and should serve as a solid foundation for future enhancements.

Recommendation: Approve - This is high-quality work that significantly enhances Nextflow's capabilities while maintaining its core principles.


@bentsherman
Copy link
Member Author

Regarding Claude's comments:

Error Messages: Consider enhancing error messages for type mismatches to be more user-friendly, especially for newcomers to typed processes.

Type checking errors will be implemented in the language server for now.

Performance: While the instanceof checks are minimal overhead, consider if there are hot paths where this could be optimized with a strategy pattern in the future.

Not opposed in principle, but likely overkill for now. In the final version we could end up with something like TaskProcessorV1/V2 instead, but we'll see.

Documentation: Consider adding more examples of complex nested types and how they interact with existing Nextflow operators.

We will continue to expand the docs, just wanted to add the basics with this PR.

@bentsherman
Copy link
Member Author

@pditommaso let me know if you have any remaining concerns or things you want to test before we merge

@pditommaso
Copy link
Member

This is huge 😬 Please, make a build using [e2e prod]

@bentsherman
Copy link
Member Author

e2e tests passed 👍

Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Massive work, some little things to address

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@bentsherman bentsherman dismissed pditommaso’s stale review October 17, 2025 15:27

Feedback was addressed

@bentsherman bentsherman merged commit 30f3c3c into master Oct 17, 2025
24 checks passed
@bentsherman bentsherman deleted the typed-processes branch October 17, 2025 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DSL2 - emit tuples with optional values

4 participants